keyword search
- Oceania > Australia > New South Wales (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
KS-GNN: Keywords Search over Incomplete Graphs via Graphs Neural Network
Keyword search is a fundamental task to retrieve information that is the most relevant to the query keywords. Keyword search over graphs aims to find subtrees or subgraphs containing all query keywords ranked according to some criteria. Existing studies all assume that the graphs have complete information. However, real-world graphs may contain some missing information (such as edges or keywords), thus making the problem much more challenging. To solve the problem of keyword search over incomplete graphs, we propose a novel model named KS-GNN based on the graph neural network and the auto-encoder. By considering the latent relationships and the frequency of different keywords, the proposed KS-GNN aims to alleviate the effect of missing information and is able to learn low-dimensional representative node embeddings that preserve both graph structure and keyword features. Our model can effectively answer keyword search queries with linear time complexity over incomplete graphs. The experiments on four real-world datasets show that our model consistently achieves better performance than state-of-the-art baseline methods in graphs having missing information.
- Oceania > Australia > New South Wales (0.05)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Oceania > Australia > New South Wales (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
ClusterTalk: Corpus Exploration Framework using Multi-Dimensional Exploratory Search
Chouhan, Ashish, Mandour, Saifeldin, Gertz, Michael
Exploratory search of large text corpora is essential in domains like biomedical research, where large amounts of research literature are continuously generated. This paper presents ClusterTalk (The demo video and source code are available at: https://github.com/achouhan93/ClusterTalk), a framework for corpus exploration using multi-dimensional exploratory search. Our system integrates document clustering with faceted search, allowing users to interactively refine their exploration and ask corpus and document-level queries. Compared to traditional one-dimensional search approaches like keyword search or clustering, this system improves the discoverability of information by encouraging a deeper interaction with the corpus. We demonstrate the functionality of the ClusterTalk framework based on four million PubMed abstracts for the four-year time frame.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
KS-GNN: Keywords Search over Incomplete Graphs via Graphs Neural Network
Keyword search is a fundamental task to retrieve information that is the most relevant to the query keywords. Keyword search over graphs aims to find subtrees or subgraphs containing all query keywords ranked according to some criteria. Existing studies all assume that the graphs have complete information. However, real-world graphs may contain some missing information (such as edges or keywords), thus making the problem much more challenging. To solve the problem of keyword search over incomplete graphs, we propose a novel model named KS-GNN based on the graph neural network and the auto-encoder. By considering the latent relationships and the frequency of different keywords, the proposed KS-GNN aims to alleviate the effect of missing information and is able to learn low-dimensional representative node embeddings that preserve both graph structure and keyword features.
Hybrid Semantic Search: Unveiling User Intent Beyond Keywords
Ahluwalia, Aman, Sutradhar, Bishwajit, Ghosh, Karishma, Yadav, Indrapal, Sheetal, Arpan, Patil, Prashant
At its core, semantic search hinges This paper addresses the limitations of on two crucial components. The first, the search traditional keyword-based search in function, acts similarly to traditional search understanding user intent and introduces a engines [1] by identifying and ranking novel hybrid search approach that leverages documents relevant to a user's query within a the strengths of non-semantic search engines, vast collection of information (corpus). However, Large Language Models (LLMs), and semantic search goes beyond this basic embedding models. The proposed system functionality with its second component: integrates keyword matching, semantic vector semantic understanding. This is where embeddings, and LLM-generated structured Transformers come into play, allowing the queries to deliver highly relevant and system to delve deeper than keyword matching.
Path-LLM: A Shortest-Path-based LLM Learning for Unified Graph Representation
Shang, Wenbo, Zhu, Xuliang, Huang, Xin
Unified graph representation learning aims to produce node embeddings, which can be applied to multiple downstream applications. However, existing studies based on graph neural networks and language models either suffer from the limitations of numerous training needed toward specific downstream predictions or have shallow semantic features. In this work, we propose a novel Path-LLM model to learn unified graph representation, which leverages a powerful large language model (LLM) to incorporate our proposed path features. Our Path-LLM framework consists of several well-designed techniques. First, we develop a new mechanism of long-to-short shortest path (L2SP) selection, which covers essential connections between different dense groups. An in-depth comparison of different path selection plans is offered to illustrate the strength of our designed L2SP. Then, we design path textualization to obtain L2SP-based training texts. Next, we feed the texts into a self-supervised LLM training process to learn embeddings. Extensive experiments on benchmarks validate the superiority of Path-LLM against the state-of-the-art WalkLM method on two classical graph learning tasks (node classification and link prediction) and one NP-hard graph query processing task (keyword search), meanwhile saving more than 90% of training paths.
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China > Hong Kong (0.04)
- Asia > Singapore (0.04)
- (5 more...)
- Research Report (1.00)
- Overview (0.93)
Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units
Yusuf, Bolaji, Černocký, Jan "Honza", Saraçlar, Murat
End-to-end (E2E) keyword search (KWS) has emerged as an alternative and complimentary approach to conventional keyword search which depends on the output of automatic speech recognition (ASR) systems. While E2E methods greatly simplify the KWS pipeline, they generally have worse performance than their ASR-based counterparts, which can benefit from pretraining with untranscribed data. In this work, we propose a method for pretraining E2E KWS systems with untranscribed data, which involves using acoustic unit discovery (AUD) to obtain discrete units for untranscribed data and then learning to locate sequences of such units in the speech. We conduct experiments across languages and AUD systems: we show that finetuning such a model significantly outperforms a model trained from scratch, and the performance improvements are generally correlated with the quality of the AUD system used for pretraining.
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- Asia > Middle East > Republic of Türkiye (0.04)
Written Term Detection Improves Spoken Term Detection
Yusuf, Bolaji, Saraçlar, Murat
End-to-end (E2E) approaches to keyword search (KWS) are considerably simpler in terms of training and indexing complexity when compared to approaches which use the output of automatic speech recognition (ASR) systems. This simplification however has drawbacks due to the loss of modularity. In particular, where ASR-based KWS systems can benefit from external unpaired text via a language model, current formulations of E2E KWS systems have no such mechanism. Therefore, in this paper, we propose a multitask training objective which allows unpaired text to be integrated into E2E KWS without complicating indexing and search. In addition to training an E2E KWS model to retrieve text queries from spoken documents, we jointly train it to retrieve text queries from masked written documents. We show empirically that this approach can effectively leverage unpaired text for KWS, with significant improvements in search performance across a wide variety of languages. We conduct analysis which indicates that these improvements are achieved because the proposed method improves document representations for words in the unpaired text. Finally, we show that the proposed method can be used for domain adaptation in settings where in-domain paired data is scarce or nonexistent.
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)